Skip to content

Dev/prebuilt flash att#175

Open
mducducd wants to merge 3 commits into
mainfrom
dev/prebuilt-flash-att
Open

Dev/prebuilt flash att#175
mducducd wants to merge 3 commits into
mainfrom
dev/prebuilt-flash-att

Conversation

@mducducd
Copy link
Copy Markdown
Collaborator

@mducducd mducducd commented May 5, 2026

Set up installation with pre-built wheels for flash_attn 2.8.3 for x86_64 and aarch64. Installation now takes less than 10min then we can consider putting it in --extra gpu

Wheels are taken from: https://github.com/mjun0812/flash-attention-prebuild-wheels

Tests have been successful on Planets but failed with Github. I hard-code --extra cpu in build.yml

@mducducd mducducd force-pushed the dev/prebuilt-flash-att branch 2 times, most recently from 3ee7f46 to 1738abf Compare May 5, 2026 15:35
@mducducd mducducd requested a review from FWao May 6, 2026 08:34
Copy link
Copy Markdown
Member

@FWao FWao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mducducd Please update the main README regarding this as a build is no longer necessary. Maybe we can leave an advanced / troubleshooting section on the bottom to explain if there are errors / the user wants different version, how the user could switch back to self-built flash-attn.

Otherwise looks good, I tested on a DGX Spark.

Comment thread README.md
source .venv/bin/activate
```

**Option B — Build flash-attn from source.** Use this on macOS, or whenever the prebuilt wheel markers do not match your platform. The `nvcc` build can take a long time and use a lot of RAM.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the flash_attn does not run on MacOS at all, right?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right! MAc only works with `--extra cpu'. I will adjust the instruction

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants